Simulation Engine

The Simulation Engine lets you test your agent against synthetic conversations before deploying to real users. It generates realistic test inputs based on your agent’s role and goal, runs them against the live agent, and scores the results across quality and safety metrics.

How it works

Open an agent and go to Safety and Evaluations > Simulation Engine.
Define or auto-generate scenarios: situations the agent should handle. Examples: “angry customer demanding a refund,” “user asking an out-of-scope question.”
Define or auto-generate personas: user types the agent will encounter. Examples: “non-technical user,” “enterprise decision-maker,” “hostile adversarial user.”
The engine combines scenarios and personas into test cases automatically.
Run the simulation. The engine executes each test case and scores the results.

Scoring metrics

Metric	What it measures
Task Completion	Did the agent accomplish what the user asked?
Hallucination	Did the agent fabricate facts not present in its knowledge?
Faithfulness	Is the response grounded in the connected Knowledge Base?
Toxicity	Did the agent produce harmful content?
Bias	Did the agent treat any group unfairly?
Tool Accuracy	Did the agent call the right tool with the correct arguments?

Agent Hardening

When test cases fail, select them and choose Agent Hardening. The engine analyzes the failure patterns and recommends changes to the agent’s instructions, model selection, or feature configuration (for example, enabling Reflection for an agent that is hallucinating). Review the recommendations, apply them to the agent, and re-run the simulation to confirm improvement.

Before going to production

Run the Simulation Engine until the agent meets your quality bar. A reasonable threshold for most production agents is 90% or higher task completion, zero toxicity failures, and a hallucination rate below your acceptable limit with all tool calls producing correct outputs. The Simulation Engine is the primary quality gate before promoting any agent to a production environment.

Overview

Agent Building

Orchestrate

Knowledge

Connections

Voice Agents

Safety and Evaluations

Monitoring

Governance

Versioning & Git

Blueprints

Lyzr App Store

Simulation Engine

How it works

Scoring metrics

Agent Hardening

Before going to production

Next steps

​How it works

​Scoring metrics

​Agent Hardening

​Before going to production

​Next steps

How it works

Scoring metrics

Agent Hardening

Before going to production

Next steps